make event dispatching non-blocking #6762

benjamin-stacks · 2025-12-15T11:20:43Z

addresses #6543

Note

This should be ready for an initial review, but note that this PR also includes the commits from #6795. Once #6795 is merged, they'll disappear from here.

This is the true diff between the two branches.

Checklist

Test coverage for new or modified code paths
Changelog is updated
Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
~~New clarity functions have corresponding PR in clarity-benchmarking repo~~

jcnelson · 2025-12-15T18:14:20Z

stacks-node/src/event_dispatcher/db.rs

@@ -0,0 +1,322 @@
+use std::path::PathBuf;


Hey! Please add the copyright header to each source file. Thanks!

Ah, good call, will do. Is "Stacks Open Internet Foundation" still the correct copyright holder?

Also, there's a whole bunch of files that lack that header, I'm wondering if there's a good way to automate this.

codecov · 2025-12-16T17:39:16Z

Codecov Report

❌ Patch coverage is 89.22801% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.08%. Comparing base (bc00b29) to head (02dfa1a).

Files with missing lines	Patch %	Lines
stacks-node/src/event_dispatcher/worker.rs	90.85%	15 Missing ⚠️
stacks-node/src/event_dispatcher.rs	84.50%	11 Missing ⚠️
stacks-node/src/event_dispatcher/tests.rs	89.41%	9 Missing ⚠️
stacks-node/src/main.rs	0.00%	8 Missing ⚠️
stacks-node/src/run_loop/nakamoto.rs	25.00%	6 Missing ⚠️
stacks-node/src/event_dispatcher/db.rs	97.52%	5 Missing ⚠️
stacks-node/src/tests/neon_integrations.rs	68.75%	5 Missing ⚠️
stacks-node/src/run_loop/boot_nakamoto.rs	50.00%	1 Missing ⚠️

❌ Your project check has failed because the head coverage (74.08%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6762      +/-   ##
===========================================
+ Coverage    67.63%   74.08%   +6.45%     
===========================================
  Files          586      587       +1     
  Lines       362403   362732     +329     
===========================================
+ Hits        245099   268747   +23648     
+ Misses      117304    93985   -23319

Files with missing lines	Coverage Δ
stacks-node/src/node.rs	`86.40% <ø> (-0.02%)`	⬇️
stacks-node/src/run_loop/mod.rs	`89.65% <100.00%> (ø)`
stacks-node/src/run_loop/neon.rs	`84.21% <ø> (+2.29%)`	⬆️
stacks-node/src/run_loop/boot_nakamoto.rs	`77.55% <50.00%> (+0.15%)`	⬆️
stacks-node/src/event_dispatcher/db.rs	`96.29% <97.52%> (+43.80%)`	⬆️
stacks-node/src/tests/neon_integrations.rs	`21.01% <68.75%> (-3.32%)`	⬇️
stacks-node/src/run_loop/nakamoto.rs	`84.75% <25.00%> (-0.58%)`	⬇️
stacks-node/src/main.rs	`0.00% <0.00%> (ø)`
stacks-node/src/event_dispatcher/tests.rs	`75.03% <89.41%> (-4.30%)`	⬇️
stacks-node/src/event_dispatcher.rs	`75.00% <84.50%> (+8.15%)`	⬆️
... and 1 more

... and 361 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc00b29...02dfa1a. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

... instead of using unnamed tuples and long parameter lists.

This will allow us to output warnings if the (non-blocking) delivery gets too far behind, because we can tell how long it took between enqueuing the event and actually sending it. This commit adds another migration to said database, so I slightly refactored the migration code.

This commit is the main implementation work for stacks-network#6543. It moves event dispatcher HTTP requests to a separate thread. That way, a slow event observer doesn't block the node from continuing its work. Only if your event observers are so slow that the node is continuously producing events faster than they can be delivered, will it eventually start blocking again, because the queue size for pending requests is bounded (at 1,000 right now, but I picked that number out of a hat, happy to change it if anyone has thoughts). Each new event payload is stored in the event observer DB, and its ID is then sent to the subthread, which will make the request and then delete the DB entry. That way, if a node is shut down while there are pending requests, they're in the DB ready to be retried after restart via `process_pending_payloads()` (which blocks until completion). So that's exactly as before (except that previously there couldn't have been more than one or two pending payloads).

This fixes [this integration test failure](https://github.com/stacks-network/stacks-core/actions/runs/20749024845/job/59577684952?pr=6762), caused by the fact that event delivery wasn't complete by the time the assertions were made.

Doing this work in the RunLoop implementations' startup code is *almost* the same thing, but not quite, since the nakamoto run loop might be started later (after an epoch 3 transition), at which point the event DB may already have new items from the current run of the application, which should *not* be touched by `process_pending_payloads`. This used to not be a problem, but now that that DB is used for the actual queue of the (concurrently running) EventDispatcherWorker, it has become one.

This is like 72437b2, but it works for all the tests instead of only the one. While only that one test very obviously failed, the issue exists for pretty much all of the integration tests, because they rely on the test_observer to capture all relevant data. Things are fast enough, and therefore we've only seen one blatant failure, but 1) it's going to be flaky (I can create a whole lot of test failures by adding a small artificial delay to event delivery), and 2) it might actually be *hiding* test failures (in some cases, like e.g. neon_integrations::deep_contract, we're asserting that certain things are *not* in the data, and if the data is incomplete to begin with, those assertions are moot).

When switching runloops at the epoch 2/3 transition, this ensures that the same event dispatcher worker thread is handling delivery, which in turn ensures that all payloads are delivered in order

…g-event-delivery

As per this thread: stacks-network#6795 (review) I used the same table/column name and semantics that we use elsewhere for the same purposes. Also fixed a comment typo.

…event-dispatcher-tweaks

…g-event-delivery

Thanks Hank for the tip!

Not sure why this wasn't caught in the pre-commit hook, I'd have assumed the checks are the same.

stacks-network#6795 (comment)

…g-event-delivery

The logic is slightly tricky here, because the size of the queue (the max number of in-flight requests before we start blocking the thread) is implemented through the `bound` parameter of the `sync_channel`, but those two values aren't actually the same. See the comment at the top of `EventDispatcherWorker::new()` for details.

See the discussion in stacks-network#6543 for some background.

benjamin-stacks · 2026-01-16T12:53:25Z

stackslib/src/config/mod.rs

+    /// to `true`, as no in-flight requests are allowed.
+    /// ---
+    /// @default: `1_000`
+    pub event_dispatcher_queue_size: usize,


This value ultimately ends up as the bound argument to sync_channel, which is of type usize.

I don't know if we have any concerns about having a platform-dependent number type on the configuration object -- if yes, we can also use something else here. Arguably, any reasonable values for this setting should fit into 16 bits anyway. If you need your queue to be bigger than 64k, you should reconsider you architecture.

jcnelson reviewed Dec 15, 2025

View reviewed changes

benjamin-stacks mentioned this pull request Dec 17, 2025

Refactor event dispatcher #6765

Merged

benjamin-stacks added 3 commits December 22, 2025 17:46

chore: return inserted event payload id directly from the insertion

a48a100

refactor: use a dedicated struct for the event http request data

41717b4

... instead of using unnamed tuples and long parameter lists.

benjamin-stacks force-pushed the feat/non-blocking-event-delivery branch from 98b3395 to 0d62bd4 Compare December 29, 2025 14:34

benjamin-stacks added 2 commits December 29, 2025 15:34

Merge branch 'develop' into feat/non-blocking-event-delivery

2566c43

Merge branch 'develop' into feat/non-blocking-event-delivery

c922ba3

benjamin-stacks mentioned this pull request Jan 6, 2026

refactor: remove the ability to use event dispatcher without DB #6785

Merged

benjamin-stacks added 4 commits January 6, 2026 13:57

benjamin-stacks force-pushed the feat/non-blocking-event-delivery branch from 478efa3 to d5fa2fc Compare January 8, 2026 17:29

benjamin-stacks added 2 commits January 9, 2026 09:32

use the same event dispatcher for neon and nakamoto

1b4d5f4

When switching runloops at the epoch 2/3 transition, this ensures that the same event dispatcher worker thread is handling delivery, which in turn ensures that all payloads are delivered in order

give event dispatcher threads distinct names, and at some debug logging

16d9387

benjamin-stacks mentioned this pull request Jan 9, 2026

[CI test only] non-blocking event delivery + artificial delay #6793

Closed

benjamin-stacks added 11 commits January 9, 2026 17:34

Merge branch 'develop' into refactor/event-dispatcher-tweaks

b2ca7d7

Merge branch 'refactor/event-dispatcher-tweaks' into feat/non-blockin…

b856441

…g-event-delivery

chore: add a migration versioning table the event dispatcher DB

dcf3092

As per this thread: stacks-network#6795 (review) I used the same table/column name and semantics that we use elsewhere for the same purposes. Also fixed a comment typo.

Merge remote-tracking branch 'origin-benjamin/develop' into refactor/…

0f28412

…event-dispatcher-tweaks

Merge branch 'refactor/event-dispatcher-tweaks' into feat/non-blockin…

2a26166

…g-event-delivery

refactor: remove error boilerplate by using thiserror

c4897da

Thanks Hank for the tip!

remove unused import

02dfa1a

Not sure why this wasn't caught in the pre-commit hook, I'd have assumed the checks are the same.

chore: give the DB version constants a shared prefix for clarity

749bbb3

stacks-network#6795 (comment)

Merge branch 'refactor/event-dispatcher-tweaks' into feat/non-blockin…

a9e21f1

…g-event-delivery

allow configuring event dispatcher blocking behavior in config.toml

05cd02c

See the discussion in stacks-network#6543 for some background.

benjamin-stacks commented Jan 16, 2026

View reviewed changes

benjamin-stacks added 2 commits January 16, 2026 13:56

chore: update changelog

94984ab

chore: add/bump copyright note in all files that this PR touches

ff7d455

benjamin-stacks changed the title ~~[WIP] make event dispatching non-blocking~~ make event dispatching non-blocking Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make event dispatching non-blocking #6762

make event dispatching non-blocking #6762

Uh oh!

benjamin-stacks commented Dec 15, 2025 •

edited

Loading

Uh oh!

jcnelson Dec 15, 2025

Uh oh!

benjamin-stacks Dec 15, 2025

Uh oh!

benjamin-stacks Dec 15, 2025

Uh oh!

codecov bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

benjamin-stacks Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

make event dispatching non-blocking #6762

Are you sure you want to change the base?

make event dispatching non-blocking #6762

Uh oh!

Conversation

benjamin-stacks commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

jcnelson Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

benjamin-stacks Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

benjamin-stacks Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

benjamin-stacks Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benjamin-stacks commented Dec 15, 2025 •

edited

Loading

codecov bot commented Dec 16, 2025 •

edited

Loading